Add metadata to dataset config files#420
Conversation
|
This looks very useful, thanks @jlamypoirier |
|
yes, this looks useful. |
This is already global, the example just happens to have only one shard. |
oleksost
left a comment
There was a problem hiding this comment.
There are some merge conflicts, apart from that LHTM!
| assert self.path.is_file(), f"File {self.path} does not exist." | ||
| return SampledDatasetConfig[SampleType].from_dict(self._convert_paths(yaml.safe_load(self.path.open("r")))) | ||
| config = yaml.safe_load(self.path.open("r")) | ||
| Assert.eq(config.keys(), {"config", "metadata"}) |
There was a problem hiding this comment.
@jlamypoirier this is causing crashes now. remove?
| Assert.eq(config.keys(), {"config", "metadata"}) |
There was a problem hiding this comment.
Yes, I must have left this there by accident
✨ Description
Gather all available metadata from readers during dataset preparation and add it to the yaml config file. Works with both blending and splitting. Example:
The file format is different from before (config is now inside of a dict instead of at top level), but the old format can still be loaded.